Schemes for learning and behaviour : a new expectancy model

نویسنده

  • Christopher Mark Witkowski
چکیده

This thesis presents a novel form of learning by reinforcement. Existing reinforcement learning algorithms rely on the provision of external reward signals to drive the learning algorithm. This new algorithm relies on reinforcing signals generated internally within the algorithm. The algorithm, SRS/E, described here generates expectancies ( -hypotheses), each of which gives rise to a specific prediction when the conditions relevant to the expectancy are encountered (the -experiment). The algorithm subsequently tests these predictions against actual events and so generates reinforcement signals to corroborate or reject individual expectancies. This procedure allows for self-contained, completely unsupervised learning to an extent not possible with previous reinforcement procedures. The SRS/E algorithm is derived from a number of postulates that constitute a new Dynamic Expectancy Model developed in this thesis. In contrast to the static policy map generated by existing Q-learning based reinforcement algorithms, which limit learning to one goal, the SRS/E algorithm generates a Dynamic Policy Map (DPM) from learned expectancies whenever a new goal is selected by the system. This new approach retains the advantages of reactivity to the environment inherent in existing reinforcement algorithms, while substantially increasing the system’s flexibility in responding to varying circumstances and requirements. Also in contrast to previous reinforcement systems, goals may be selected arbitrarily and are not limited to those which were associated with reward during the learning steps. This new method allows multiple goals to be pursued either simultaneously or sequentially. The single SRS/E implementation has been compared directly to the published results from of a family of reinforcement based algorithms, Dyna-PI, Dyna-Q and Dyna-Q+ (Sutton, 1990), themselves extensions to the groundbreaking Q-learning algorithm (Watkins, 1989). Under equivalent “ideal learning conditions” the SRS/E algorithm was found to outperform the equivalent Dyna reinforcement program to learn a simple maze task by a factor of some 40:1. The SRS/E learning algorithm was also found to be robust when tested under controlled “noise” conditions. SRS/E was also compared directly to Sutton’s Dyna-Q+ algorithm on a range of alternative path and route blocking tasks and was found to offer a similar performance, but SRS/E employs a “biologically plausible” extinction mechanism, mirroring findings from animal behaviour research. Finally SRS/E was tested with experimental designs for “latent learning” and “place learning”, drawn directly from animal learning research. Both are regarded as presenting severe challenges to conventional reinforcement learning theories. SRS/E performs well on both tasks, and in a manner consistent with findings from animal experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information seeking in inquiry-based learning pedagogy: Proposing a preliminary model

Background and Aim: This study attempts to propose a suggestive model for theorising in the field of Inquiry-Based Information Behaviour (IBiB). Method: To achieve the research aim, Piaget’s Cognitive Development Theory, Dewey’s Constructivist Theory, as well as IBL Pedagogy were analysed. Taking into account the current information behaviour models and theories which are developed based on th...

متن کامل

A 'Uses and Gratification Expectancy Model' to Predict Students' 'Perceived e-Learning Experience'

This study investigates ‘how and why’ students’ ‘Uses and Gratification Expectancy’ (UGE) for e-learning resources influences their ‘Perceived e-Learning Experience.’ A ‘Uses and Gratification Expectancy Model’ (UGEM) framework is proposed to predict students’ ‘Perceived e-Learning Experience,’ and their uses and gratifications for electronic media in a blended learning strategy. The study util...

متن کامل

بررسی رفتارهای مدیریتی کشاورزان در هنگام خشکسالی به عنوان پاسخ های پیشگیرانه: مورد مطالعه شهرستان دهلران

Farmers in developing countries are among the most vulnerable to climate change effects, particularly drought. Drought is a serious and dangerous phenomenon in most part of the world particularly arid and semi aired region such as Iran and it is estimated that Middle East is expected to be particularly badly affected with a decline in precipitation of at least 40mm over the coming century. In S...

متن کامل

Indoor Positioning and Pre-processing of RSS Measurements

Rapid expansions of new location-based services signify the need for finding accurate localization techniques for indoor environments. Among different techniques, RSS-based  schemes and in particular oneof its variants which is based on Graph-based Semi-Supervised Learning (G-SSL) are widely-used approaches The superiority of this scheme is that it has low setup/training cost and at the same ti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997